Skip to content

feat(jdbc): extend OpenTelemetry instrumentation for metadata and pagination#12918

Open
keshavdandeva wants to merge 6 commits intojdbc/feature-branch-otelfrom
jdbc/add-otel-to-metadata
Open

feat(jdbc): extend OpenTelemetry instrumentation for metadata and pagination#12918
keshavdandeva wants to merge 6 commits intojdbc/feature-branch-otelfrom
jdbc/add-otel-to-metadata

Conversation

@keshavdandeva
Copy link
Copy Markdown
Contributor

@keshavdandeva keshavdandeva commented Apr 24, 2026

b/491245568

Key Changes

Core Instrumentation Logic

  • Database Metadata Tracing: Added OTel spans to key methods in BigQueryDatabaseMetaData.java (getCatalogs, getSchemas, getTables, getColumns) to capture underlying API calls.
  • Pagination Span Links: Captured the parent span context at the start of fetchNextPages in BigQueryStatement.java and linked background pagination spans back to it, avoiding timeline anomalies.
  • Cross-Thread Context Propagation: Stored the SpanContext in BigQueryBaseResultSet.java at creation time and made it current during next() in BigQueryJsonResultSet.java and BigQueryArrowResultSet.java to survive thread hops.
  • Tracer Reuse: Extracted getSafeTracer to BigQueryJdbcOpenTelemetry.java as a static utility to ensure consistent fallback behavior across the driver.
  • Lambda Extraction: Extracted the large lambda function in populateArrowBufferedQueue in BigQueryStatement.java to its own private method processArrowStream to improve readability and maintainability.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates OpenTelemetry tracing into the BigQuery JDBC driver, focusing on ResultSet iteration and DatabaseMetaData operations. Key changes include capturing span contexts during result set initialization and propagating them during next() calls, as well as adding tracing to metadata methods like getTables, getColumns, and getSchemas. Review feedback identifies significant performance overhead from wrapping high-frequency next() methods in telemetry scopes and points out that spans for asynchronous metadata operations end prematurely, leading to inaccurate duration metrics and fragmented traces.

@keshavdandeva
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates OpenTelemetry tracing across the BigQuery JDBC driver, specifically targeting metadata operations and result set processing. Key changes include wrapping blocking operations in appropriate tracing scopes, propagating span contexts to background threads, and standardizing tracer acquisition via a new utility method. Feedback focuses on refining the tracing implementation, such as correctly handling background span parents to avoid timeline anomalies, ensuring span statuses are updated on errors, and replacing redundant null checks on SpanContext with validity checks.

@keshavdandeva keshavdandeva marked this pull request as ready for review April 24, 2026 16:25
@keshavdandeva keshavdandeva requested review from a team as code owners April 24, 2026 16:25
// Advance the cursor. Potentially blocking operation.
BigQueryArrowBatchWrapper batchWrapper = this.buffer.take();
BigQueryArrowBatchWrapper batchWrapper;
try (Scope scope = Context.current().with(Span.wrap(originalSpanContext)).makeCurrent()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not including entire block in the span? Line 261 performs deserialization, so it's still part of the operation

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so, earlier I did that but then Gemini review said:

The next() method is a high-frequency operation in JDBC, often called millions of times during result set iteration. Wrapping the entire method body in an OpenTelemetry Scope introduces significant overhead due to ThreadLocal access and object allocations for every single row. This is particularly inefficient when next() is simply incrementing an index for rows already available in memory (e.g., when isNested is true or when iterating within the current Arrow batch). Consider moving the context propagation to only the specific sections where blocking operations or external API calls occur, such as the block starting at line 243 where buffer.take() is called. Ensure that the scope is managed in an exception-safe manner (e.g., using try-with-resources) to prevent resource leaks.

But I guess, for Arrow, as this is batch of rows, maybe its fine

try {
// Advance the cursor,Potentially blocking operation
this.cursor = this.buffer.take();
try (Scope scope = Context.current().with(Span.wrap(originalSpanContext)).makeCurrent()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for Arrow

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment from Gemini about this as well:

Similar to the Arrow implementation, wrapping the entire next() method in a Scope creates substantial performance overhead for every row processed. Since next() is a hot path, the cost of managing the OpenTelemetry context for every iteration can lead to a noticeable regression in throughput for large result sets. It is recommended to limit the scope of context propagation to the parts of the method that actually perform work requiring context, such as the blocking buffer.take() call. Ensure the scope is closed in an exception-safe manner to prevent leaks.

.addLink(parentSpanContext)
.startSpan();

try (Scope scope = backgroundSpan.makeCurrent()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move startSpan() there too? So no need for manual cleanup at the end

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this, and unfortunately, the OTel does not make the Span interface AutoCloseable. Only the Scope returned by span.makeCurrent() is AutoCloseable to handle thread-local cleanup. That is why we are forced to manually call span.end() in the finally block

@keshavdandeva keshavdandeva requested a review from logachev April 27, 2026 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants